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ABSTRACT 



Procedures used to establish the comparability of scores derived from the College 
Board Admissions Testing Program (ATP) computer adaptive SAT prototype and the paper- 
and-pencil SAT are described in this report. Both the prototype, -which is made up of Veital 
and Mathematical computer adaptive tests (CATs), and a form of the paper-and-pencil test 
were administered to just greater than 500 examinees using a random groups counterbalanced 
design. Both linear and equipercentile procedures were used for equating in each of the 
separate testing orders (paper-and-pencil then CAT or CAT then paper-and-pencil). Data 
were not pooled across the orders because the groups were not randomly equivalent due to 
administrative problems. The linear procedure was chosen for each test (Verbal or 
Mathematical) for each order, and results from the two orders were averaged. The final 
Verbal and Mathematical CAT conversions were quite similar to the paper-and-pencil 
conversions, although the two conversions for Verbal and two conversions for Mathematical 
did differ by as much as 20 scaled score points in certain regions of the scale. 
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Deriving Comparable Scores for Computer Adaptive and Conventional Tests: 

An Example Using the SAT 



Daniel R. Eignor 



INTRODUCTION 

Pecent psychometric and systems advances, coupled with the availability of powerful 
yet relatively inexpensive microcomputers, have allowed computer adaptive testing (CAT) for 
large scale testing programs to become a reality at Educational Testing Service (ETS) and 
other testing organizations. (See Stocking and Swanson, 1992, for a discussion of some of 
the psychometric and systems advances.) At ETS, a myriad of activities are taking place that 
are related to the development of operational computer adaptive versions of the Graduate 
Record Examinations (GRE) General Test and the National Council of State Boards of 
Nursing (NCSBN) Registered Nurse (RN) and Practical Nurse (PN) exams. In addition, 
ETSers are working on a computer adaptive Professional Assessments for Beginning 
Teachers examination called Praxis I: Computer-Based Academic Skills Assessments. TMs is 
a test for which no paper-and-pencil counterpart will exist. 

With all of this activity taking place to draw upon, The College Board, the major 
client of ETS, decided to develop a computer adaptive prototype of the Admissions Testing 
Program (ATP) SAT. Details involving the development of the SAT CAT prototype can be 
found in a paper by Eignor, Stocking, Way, and Steffen (1993). One important difference, 
however, between the SAT CAT and the other adaptive tests being developed at ETS is that 
the SAT CAT prototype was never intended to be used operationally, i.e., to yield scores to 
be used for admissions purposes. This decision was made for two reasons: 1) the Program 
did not have a pool of secure items that could be devoted to the CAT and, hence, the CAT 
pool had to be built from items that had appeared in the past on SAT paper-and -pencil forms 
that have since been disclosed; and 2) even if a pool of secure items had existed for CAT 
purposes, no delivery mechanism was in place in the schools to deliver the SAT CAT to the 
many students who would want to take it during the school year. 

The SAT CAT prototype, in the initial planning stage, was thought of as a means of 
providing colleges that administer forms of the SAT through the Institutional Admissions 
Testing Program (lATP) with a convenient way of obtaining SAT scores for admitted 
students who have these scores missing from their records. However, over .he course of the 
development phase of the project, it was decided that the CAT should instead be introduced 
into selected high schools to examine the feasibility of computer delivery of tests in that 
setting. The present purpose of the SAT CAT prototype is to provide students with a quick, 
yet novel, way to get an indication of how well they would do on the present full-length 
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paper-and-pencil SAT. Such a use necessitated that score comparability between the paper- 
and"pencil SAT and the SAT CAT be established. The SAT CAT is not alone, however, in 
regard to the need to establish comparability of scores derived from the two modes of 
administration. 

All testing programs that test via paper-and-pencil examinations and then want to 
develop computer-based versions, particularly computer adaptive versions, of these 
examinations face the difficult issue of establishing the comparability of scores derived from 
the two administrative modes. CAT and paper-and-pencil testing will, at least for some 
transition period, continue to occur together in these programs. Further, even if paper-and~ 
pencil testing is eventually phased out, scores from the CAT will continue to need to be 
reported on the reported score scale that had existed for the paper-and-pencil examination. 
All of these considerations necessitate that a score comparability study be conducted. 

Data collection designs for collecting data to equate test forms that are described in 
the current literature (see Angoff, 1984) were developed for equating parallel forms of 
examinations administered via the same medium, which for the most part has been paper- 
and-pencil. It is unclear as to the applicability of such designs to the equating of scores 
derived from administrations of forms in different mediums, particularly when one score is 
derived via an adaptive strategy while the other score is developed in a conventional or non- 
adaptive fashion. However, until new procedures are developed for collecting data to derive 
comparable scores for CAT and paper-and-pePxCil examinations, the traditional procedures 
presented in Angoff (1984) will need to be used. The comparability study described in this 
paper represents the first attempt at ETS to derive comparable scores on CAT and paper-and- 
pencil examinations. The study should be viewed in the context provided earlier; viz., that 
while the CAT scores need to be reported on the existing SAT scales so that students can get 
a good indication of how well they would do on the paper-and-pencil examination, the CAT 
scores will never be used operationally for admissions purposes. If the intention had been to 
use the CAT scores for admissions purposes, a somewhat different data collection design 
would undoubtedly have at least been considered and the sizes of the samples used in the 
comparability study would have been much larger. This matter will be discussed further in 
tae discussion section of this paper. 

The purpose of this paper is to describe the procedures used to establish the 
comparability of scores derived from the SAT CAT prototype and the paper-and-pencil SAT. 
The paper may, in addition, provide a focal point for further discussion of how the 
comparability of scores on CAT and paper-and-pencil examinations might be established in 
the ftiture. 



METHOD 



Participating Scliools and Students 

Collecting data for the comparability study at regular national test center Admissions 
Testing Program (ATP) administrations of the SAT was not possible, given the large number 
of examinees taking the paper-and-pencil examinations at the same time at these 
administrations and the importance placed on the results of the paper-and-pencil testing. 
Hence, focus was placed on colleges that administer the SAT through the Institutional 
Admissions Testing Program (lATP). These colleges administer secure forms of the SAT at 
their campuses and frequently score the tests themselves, although ETS does maintain a 
central scoring service for these colleges. SAT scores from LATP administrations may be 
used for a variety of purposes, for admissions puiposes (much in the same way scores from 
regular national test center administrations are used), for placement purposes, or simply to 
fill out a student's record. 

The state of Georgia mandates that all students entering two and four year colleges 
and universities, even if already accepted at these colleges or universities, have SAT scores 
on their records. In addition, many of these schools use SAT scores for placement purposes, 
and test fairly large numbers of incoming freshmen for fall placement into English and 
Mathematics classes during the summer orientation period. Hence, institutions in Georgia 
were seen as an excellent source of data for the comparability study. Thus, a number of 
institutions in Georgia were contacted to see if they would be interested in administering the 
SAT CAT during the period of summer orientation when the regular paper-and-pencil SAT 
would be administered. Three Georgia institutions, two two-year institutions and one four- 
year institution, all in southeni Georgia, agreed to participate in the study. The two-year 
institutions were Darton College and South Georgia College. The four-year institution was 
Valdosta State CoUege. ETS contracted with an overall computerized testing coordinator 
who resides in Georgia, ten IBM 386 personal computers were rented and shipped to each of 
the three mstitutions, and the coordinator oversaw the installation/deinstallation of the 
equipment. All testing took place during 1992 summer orientation periods at each of the 
three institutions and the tests were administered by the testing coordinators at each 
institution. Because these periods did not coincide at the three institutions, the rented 
equipment could be shared across institutions. 

Eased on projected numbers of incoming freshmen who were to take part in the 
comparability study at the three Georgia institutions, it was determined that additional testing 
would need to take place to augment the total sample size. Invitations to participate were 
placed in newspapers in the Princeton, New Jersey area and in the ETS weekly new??naper. 
Students interestexl in participating, who had to be juniors or seniors in high school, ^ere 
tested at the permanent ETS institutional computer-based testing center at Rider College. 
These students were paid $25 for taking both the paper-and-pencil SAT and the CAT, their 
paper-and-pencil fees were waived, and the students were given the option of having their 
paper-and-pencil SAT scores a Ided to their national score records, which the student has sent 
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to institutions to which they are applying. The students had to make tliis decision after 
testing and before they saw their paper-and-pencil scores. Hence, it was felt that the testing 
for the comparability study at Rider College was done under conditions under which the 
students would be reasonably motivated to do well. 

Incoming freshmen at the three Georgia institutions were paid $30 to participate in the 
comparability study. In addition, each of the institutions was offered an honorarium. Since 
the paper-and-pencil SAT scores for each of these students was to be used for fall placement 
purposes, it was felt that students would be motivated to perform well, particularly on the 
paper-and-pencil test. It was hoped that the students would also be motivated when they took 
the CAT. Because of the novel and unique nature of the CAT experience, it was felt that the 
students would be interested in the CAT and would attempt to perform well. (Questionnaire 
data, not included with this paper, bears out the fact that there was a good deal of mterest in 
tiie CAT.) 

Data Collection Design 

Because of the number of examinees anticipated for the comparability study, it was 
determined early in the planning process that the data collection design to use would be a 
random groups counterbalanced design with both tests administered to each group (Design n 
in Angoff, 1984). Students were to be randomly assigned, on a within-school or testing 
center (i.e., Rider College) basis, to the two possible testing orders, CAT then paper-and- 
pencil and paper-and-pencil then CAT. For the sample sizes initially anticipated (around 400 
students), the random groups counterbalanced design provides much smaller standard errors 
of equating then do the two other designs that could have been considered, the random 
groups design with one test admmistered to each group and the non-equivalent groups, 
common item design. (See Angoff, 1984, for a discussion of these designs and the standard 
errors or Lord, 1950, for a discussion and comparison of the standard errors.) 

Practitioners who have recentiy conducted studies that have attempted to establish the 
comparability of paper-and-pencil and linear computer-based test (CBT) scores (i.e., an intact 
paper-and-pencil test is sunply administered on a computer) via the random groups 
counterbalanced design have run into the problem of asymmetric practice effects (see Mazzeo 
and Harvey, 1988). The standard procedure for dealing with data from the i^andom groups 
counterbalanced design described in Angoff (1984), which calls for pooling the summary 
statistics (i.e., means and standard deviations) from the two possible test orders, assumes that 
any practice effects that result from the testing experience are constant and symmetric. With 
littie experience on which to base a decision and virtually nothing written on the subject of 
equating CATs to paper-and-pencil tests, the assumption of symmetric practice effects was 
seen as extremely tenuous, and, hence, plans were made to equate separately in the two 
orders and then form some sort of average. For both orders, scores for the CAT were to be 
equated to scores on the paper-and-pencil test, for which a raw to scaled score conversion 
table already existed. This approach of equating sepai*ately in the two orders and then 
averaging the two equating functions has been discussed by Holland and Thayer (1990). 
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However, as will be discussed later in the paper, for another more fundajriental reason than 
asymmetric practice effects, separate equatings had to be done for the two orders and 
averaged in this study. 

As mentioned previously, students were to be randomly assigned to the two testing 
orders (CAT then paper-and-pencil or paper-and-pencil then CAT) on a witbin-coUege or 
testing center basis. Further, the testing coordinators at the four sites were given the option 
of administering the CAT and paper-and-pencil tests on the same day or on different days. 
(A combination of procedures, where one group of students took both the paper-and-pencil 
test and the CAT on the same day and aaother group took the tests on different days, was 
also possible.) Figure 1 contains a description of the two designs that was sent to testing 
coordinators at each of the four sites. Figure 2 contains the detailed procedures sent to these 
coordinators for splitting the total group to be tested, either on a given day or during the 
entire testing session, into random subgroups. 



Insert Figures 1 and 2 about here 



Tests Adtnainistered 



The paper-and-pencil SAT form that was administered in the study was a secure form 
developed for the national Admissions Testing Program (ATP) and then designated for use in 
the Institutional Admissions Testing Program (lATP). The form consisted of four thirty 
minute sections given to all examinees in the same fixed order, with the first test section 
being a section that contained SAT-M items. It should be noted that the variable section of 
the SAT is removed for LATP administrations and the section containing the Test of Standard 
Written English (TSWE) was specifically removed for this study. Hence, the test contained 
four sections rather than the usual six. The two thirty minute SAT-V sections contained 45 
and 40 items, respectively, while the two thirty minute SAT-M sections contained 35 and 25 
items, respectively. The total 85-item SAT-V contained the usual four item types: sentence 
completion, analogies, antonyms, and reading comprehension items while the total 60-item 
SAT-M contained the usual two item types: five-choice regular math items and four-choice 
quantitative comparison items. (All SAT-V items are five-choice.) Table 1 contains a 
breakdown of the number of items by item type for SAT-V and SAT-M and an additional 
breakdown of the total 60-item SAT-M by content area. 



Insert Table 1 about here 



The SAT-V CAT adn^'nistered to exammees was a fixed length CAT of 27 items and 
the SAT-M CAT was a fixed length CAT of 20 items. The development of the CAT item 
pools and the specifics of the SAT CATs are described in a paper by Eignor, Stocking, Way, 
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™^ ^ ^ breakdown by item type of the number of items in 

the SAT-y and SAT-M CATs and the number of items in the total CAT pools The 
numbers of items for the various item types on the CATs are basicaUy proportional to the 
numbers of items for the item types that are contained on the full-length 85- and 60-item 
pap<;r-and-pencil tests. Table 1 also contains a breakdown of the SAT-M CAT and SAT-M 
item pool by content area. 

Unlike the paper-and-pencil tests, wMch were given to each examinee in the same 
fixed order exammees were aUowed to choose which CAT, Verbal or Math, they wanted to 
take tirst. If an ex,ammee, for example, chose the Verbal CAT to take first after 
mtroductoiy material and the tutorials, he/she was administered 27 Veital items in up to 40 
mmutes, follows! by a brief pause and then 20 Math items in up to 40 minutes. Examinees 
were not aUowed to omit items on the CATs nor were they allowed to review responses to 
earlier items (i.e., examinees could progress only in a forward fashion). 

AH examinees took the two CATs or the four sections of the paper-and-pencU test on 
the same day. As mentioned in the previous section, examinees could either be administered 
^ testmg matenal (the two CATs and the four paper-and-pencU sections) on the same day or 
they could receive the CATs on one day and the paper-and-pencil test on another. 

Scores to be Equated 

For the paper-and-pencU test, scoring was straightforward. The score for each 
exammee on the 85-item SAT-V was created via formula scoring, using the formula R-UW 
for five-choice items. The score for each examinee on the 60-item SAT-M was created via 
formula scoring, using the formula R-'/^W for the 40 regular five-choice items and R-VaW for 
the 20 four-choice quantitative comparison items. The separate scores for the two item types 
were then summed and rounded to the nearest integer, as was the formula score for SAT-V 
Hence, rounded formula scores for the paper-and-pencil SAT-V and SAT-M were used in the 
equatmgs. 

For the CATs, scoring was relatively straightforward, but involved some intermediate 
.? 'y'^^'""' paper-and-pencil test administered to examinees was 

mibedded as a reference test". That is, the paper-and-pencil test, with associated three 
parameter logistic (3-PL) item parameter estimates, was embedded for score creation 
puiposes; the items on the reference test were not used in any of the CATs. An examinee's 
final abihty estimate (6) on SAT-V, derived after administration of 27 items or however 
many items the examinee completed in 40 minutes (see a later section of the paper for how 
not r^ched items we^^ treated), was then used with the 3-PL item parameter es^Lates on 
the 85-item SAT-V reference test to create an estimated true formula score for the examinee 
on the reference test. (See Lord, 1980, p.230, for the formula (15.6) to create estimated 
tme formula scores.) This tnie formula score was then rounded to the nearest integer 
ExacUy the same procedure was used with the examuiee's final 6 on SAT-M derived after 
admmistration of 20 items or however many items the examinee completed ii 40 minutes 
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HencCj rounded estimated true formula scores on the reference test were used as the SAT-V 
and SAT-M CAT scores in the equating. Finally, and worth noting again, the paper-and- 
pencil and CAT scores used in the equatings are both scores on the same form. This does 
not qualify as an "equating" in the usual sense of the word in that scores on different parallel 
forms of the same instrument aren't being used. Rather, the scores being used are scores on 
the same form developed through administrations in two different modest-observed formula 
scores derived from administration in paper-and-pencil mode and estimated true formula 
scores derived from administration in adaptive mode via a computer. 

RESULTS 

Numbers of Examinees Tested and Deletion of Cases 

The number of examinees at each of the colleges/test centers taking the paper-and- 
pencil SAT and the SAT CAT on the same and different days are presented in Table 2. 
Names of individual colleges/test centers are not identified in Table 2 and other related 
tables. Instead, the college/test centers are referred to as College/Centers A-D. Also 
presented in Table 2 are the number of examinees who took the CAT first and the number 
who took the paper-and-pencil test (abbreviated as P-P) first on the same or different days. 



Insert Table 2 about here 



Although fairly elaborate instructions were prepared for splitting the total groups of 
examinees to be tested into randomly equivalent (i.e., counterbalanced) subgroups (see 
Figure 2), it is clear from the data contained in Table 2 that the counterbalancing procedures 
were not closely followed. A review of the number of tests given per day at each of the four 
colleges/testing centers indicated that only at two of them were counterbalancing procedures 
closely followed each day. Hence, pooling of data from the two testing orders was clearly 
not possible, i.e., the groups taking the two orders were not randomly equivalent, and 
separate Verbal and Math equatings for each of the two orders needed to be performed. 

Before any analyses could take place, examinees' CAT and paper-and-pencil records 
had to be matched. This was done by matching on candidates' ID numbers (the first eight 
digits of their social security numbers). In the process of matching, it was found that a 
number of examinees had not taken both the CAT and the paper-and-pencil test. In addition, 
for a number of examinees, the records could not be matched. Finally, the ten examinees 
from College/Center B and College/Center C who took the CAT and paper-and-pencil tests 
on different days were dropped from the data sets; clearly no attempt was made with these 
students to fonn counterbalanced groups. Table 3 contains the number of examinees 
remaining in the data sets after matching CAT and paper-and-pencil records and removing 
examinees with incomplete data or who were inappropriately tested. 
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Insert Table 3 about here 



Incomplete CATs 

While the timing of the CATs was seen as more than ample (40 minutes for 27 SAT- 
V items, 40 minutes for 20 SAT-M items), it was anticipated that not all examinees would 
complete the CATs. In Ueu of a formal study, a somewhat arbitrary rule was put into place 
that an examinee had to complete at least 75% of each of the CATs, i.e., 21 SAT-V items 
and 15 SAT-M items, in order to be included in the study. For examinees completing more 
that 75 % of one or both of the CATs but less than 100%, the final 8 used for creation of an 
estimated tnie formula score would be the 6 derived after the last item attempted. 

Five examinees failed to complete the SAT-M CAT, but aU of these examinees 
completed at least 15 items. Eleven examinees failed to complete the SAT-V CAT, but all 
of these examinees completed at least 21 items. Hence, no examinees were eliminated from 
the comparability study based on the 75 % completion rule. 

Summary Data by Institution and for Total Groups 

Table 4 contains CAT and paper-and-pencil summary data (means, standard 
deviations, correlations and sample sizes) for SAT-V separately by testing order for each of 
the four coUeges/testing centers and then for the total groups. Table 5 contains comparable 
data for SAT-M. The numbers in parentheses in Table 4 and 5 are the summary statistics 
and sample sizes after removal of outiying pairs of scores; this procedure will be described 
in a subsequent section of the paper. 



Insert Tables 4 and 5 about here 



As can be seen from the data in Tables 4 and 5, there is a good deal of variation in 
average performance across the four coUeges /testing centers, with the weakest perfoimers 
bemg the examinees from CoUege/Center A and the sti-ongest performers being the 
examinees from College/Center D. Outside of the somewhat lower correlations for the SAT-M 
CAT and paper-and-pencil test scores for examinees from CoUege/Center A, particularly for 
the paper-and-pencU test taken first order, no other data in Tables 4 and 5 appears pecuUar. 
The CAT/paper-and-pencil correlations for the total groups are particularly high, and for the 
SAT-M CAT taken first order, the correlation (.933) is almost as high as could possibly be 
expected given the reUabiUties of the CAT and paper-and-pencU tests (neither of which are 
estimated to exceed .94). 
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Outlier Analysis 

Although the CAT and paper-and-pencil correlations for the four testing orders (two 
for SAT-V and two for SAT-M) were quite high, initially ranging from .897 to .927, it was 
felt that the correlations might be improved upon if a bivariate outlier analysis was performed 
on each order, and outlying pairs of scores removed. For each of the two orders for SAT~V 
and for SAT-M, a bivariate plot of standardized scores was created, with standardized paper- 
and-pencil scores on the abscissa and standardized CAT scores on the ordinate. Figures 3 
and 4 contain the two SAT~V plots while Figures 5 and 6 contain the comparable plots for 
SAT-M. Each point in a plot is based on the standardized paper-and-pencil and CAT scores 
for a particular examinee. Looking at Figures 3-6, there do appear to be some outliers, but 
for the most part, the shapes of the ellipses formed by the complete sets of points reflect the 
high correlations seen between the scores. 



Insert Figures 3-6 about here 



To determine which outlying sets of points to possibly exclude, a criterion suggested 
by Bamett and Lewis (1984, p.245) was applied; this criterion is based on a multivariate 
normal model. In the bivariate case, the criterion function can be written as 

R = — ^— (X^ - 2r XY + Y^) 

1-2 " 



An observation (pair of standardized scores X and Y) is considered an outlier, i.e., not a 
member of the same population as the other observations, at the a level of statistical 
significance if 

i? > - 21n [1 - (1 - a)^/^] , 



where N is the total sample size. 

For a = .05 and the SAT-V CAT first order (N=271), R must exceed 17.1 in order 
for an observation to be considered an outlier. For a = .01, R must exceed 20.4. 
Comparably sized cutoffs result for the other three orders. However, because of the high 
CAT/paper-and-pencil correlations, across all four orders the highest R seen for a particular 
observation was 15.1. Hence, at the a .05 level, none of the observations across all four 
orders would be considered an outlier if the Bamett and Lewis statistical criterion was used. 
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After ftiither study of the bivariate plots and the R values for the observations for all 
four orders, it was decided that an arbitrary valu6 of R > 7 would be used as the cutoff for 
deciding on which observations qualified as outliers. Observations with R > 7 are circled in 
Figures 3-6, with R values printed alongside the points. These score pairs were then deleted 
from the datasets and the summary statistics in Tables 4 and 5 were recalculated and are 
presented in parentheses in these tables. The data sets with these outliers removed were then 
used in the subsequent equatings. 

Removal of tliese outliers clearly improved the correlations between scores for each of 
the four orders. Moreover, the scores removed clearly are the outliers in Figures 3-6. In 
sum, although the outlier analysis done for this study was based on an arbitrary criterion, 
various results indicate the effectiveness of the deletion process. 

Totai Group Means and Frequency Distributions 

Total group means and standard deviations for the two orders for SAT-V and for 
SaT-M were extracted from Tables 4 and 5 and are presented in Table 6. Noteworthy 
observations about the data contained in Table 6 are: 1) for three of the four orders, there is 
a decrease in average peiformance on the test taken second when compared to average 
performance on the test taken first (the SAT-V paper-and-pencil test taken first order being 
the exception); 2) for both orders for SAT-V, the paper-and-pencil standard deviations are 
less than the CAT standard deviations; and 3) for both orders for SAT-M, the paper-and- 
pencil standard deviations are greater than the CAT standard deviations. The fmding that, 
for three of the four orders, there is a decrease in performance on the test taken second, 
presumably due to a fatigue effect, runs somewhat counter to the fmdings in most of the 
studies reviewed by Mazzeo and Harvey (1988), where there appeared to be a practice effect 
on the test taken second, although the practice effects were frequently not symmetric. Also, 
the studies reviewed by Mazzeo and Harvey involved instances where linear computerized 
tests were equated to paper-and-pencU tests, not CATs equated to paper-and-pencil tests. 



Insert Table 6 about here 



Figure 7 contains grouped frequency distributions of estimated true formula scores 
from the SAT-V CAT and observed formula scores from the paper-and-pencil SAT-V for the 
two testing orders. Figure 8 contains comparable grouped frequency distributions for SAT- 
M. Only one examinee obtained a maximum possible score (an observed formula score of 
60 on the paper-and-pencU SAT-M). Because the total sample sizes are fairly small for the 
four orders, score frequencies in the ungrouped frequency distributions (not presented in the 
paper) are often extremely small and data are sparse in certain regions. Clearly, if a 
curvilinear equating procedure were to be used to estabUsh comparable scores for the CAT, 
these frequency distributions would need to be smoothed. 



Insert Figures 7 and 8 about here 



Equatings Performed and Final Unrounded Conversions 

As mentioned earlier, the da^asets after outlying sets of scores were removed were 
then used in four single-group equatings (two for SAT~V, two for SAT-M). The N's used in 
these equatings, which were presented in parentheses in Tables 4 and 5, are: SAT-V CAT 
first, N = 266; SAT-V P-P first, N == 230, SAT--M CAT first, N = 267; and SAT-M P-P 
first, N = 230. For each of the four smgle-group equatings, two procedures were used: 

L A linear procedure based on setting CAT and paper-and-pencil standard deviates 
equal; and 

2. A curvilinear procedure based on an equipercentile equating of unsmoothed CAT 
and paper-and-pencil score distributions. 

For each order for each test, the (raw-to-raw) linear and curvilinear equating functions were 
compared to see if there was any evidence of a curvilinear relationship between CAT and 
paper-and-pencil scores. This was done through the use of difference plots, with the linear 
conversion used as the criterion and differences between the curvilinear and linear 
conversions plotted with respect to the linear conversion. The two SAT-V plots are shown in 
Figure 9 and the two SAT-M plots are shown in Figure 10, In each plot, the zero difference 
or straight line is based on the linear conversion and the non-linear curve is based on 
differences between the curvilinear and linear conversions across all obtained score points. 
If the relationship between CAl' and paper-and-pencil scores is curvUinear, this latter curve 
wiU appear to be a convex or concave curve with respect to the zero difference line or, in 
certain instances, an S-shaped curve. 



Insert Figures 9 and 10 about here 



Looking at the four plots contained in Figures 9 and 10, in no instance does there 
appear to be any real evidence of curvilinearity in the (raw-to-raw) relationship between CAT 
and paper-and-pencil scores. Hence, the linear procedure was chosen for each of the four 
orders and another set of linear equatings were performed, this time reading in the raw-to- 
scaie conversion table for the paper-and-pencil SAT-V and the paper-and-pencil SAT M, so 
that the output would contain SAT-V and SAT-M CAT raw-to-scale conversion tables for 
each of the orders reflecting the resists of the equating process. 

Table 7 contains the unrounded raw-to-scale SAT-V conversions resultmg form the 
two orderings, CAT taken first and paper-and-pencil taken first. Also contained in Table 7 
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is the unrounded paper-and-pencil raw-to-scale conversion. The two CAT conversions are 
very similar, differing by a maximum of 8.81 score points (on the unrounded 200 to 800 
scale) at the maximum formula score of 85. Because the two conversions are so similar, a 
decision was made to simply average the two separate conversions in deriving the final 
unrounded SAT-V CAT conversion table. This unrounded average conversion is also presented in 
Table 7. 



Insert Table 7 about here 



Table 8 contains the umounded raw-to-scale SAT-M CAT conversions resulting from 
the two orderings, CAT taken first .^d paper-and-pencil taken first. Also contained in Table 
8 is the unrounded paper-and-pencil raw-to-scale conversion. Because each of the CAT 
conversions is higher at the top than the paper-and-pencil conversion and lower at the bottom 
than the paper-and-pencil conversion, there are missing points at the top and at the bottom of 
each of the CAT conversions. If the conversions are reasonably linear, missing conversion 
points can be established via linear interpolation. 



Insert Table 8 about here 



Unlike SAT-V, the two SAT-M CAT conversions presented in Table 8 are quits 
dissimilar. At a formula score of 55, the two conversions differ by 30.15 pomts (on the 
unrounded 200 to 800 scale). In addition, the paper-and-pencil first SAT-M CAT conversion 
is a good deal more discrepant from the original SAT-M paper-and-pencil conversion than 
the CAT first SAT-M CAT conversion. Since in this study, scores are being created on two 
different "versions" of the same test form (one score being created via the CAT process and 
the other from regular paper-and-pencil testing ), it is reasonable to expect that the CAT 
conversion wiU fairly closely approximate the original paper-and-pencil conversion. Given 
this, the SAT-M CAT conversion resultmg from the paper-and-pencil then CAT testing order 
is clearly the outlier. A decision was made not only to sunply average the CAT conversions 
from the two orders, but also to form weighted averages where the CAT conversion from the 
CAT then paper-and-pencil order counted two (2:1) and three (3:1), times as mu-h as the 
CAT conversion from the paper-and-pencil then CAT order. (Although the CAT conversion 
from the paper-and-pencil then CAT order was so discrepant, a rationale for completely 
discarding this conversion could not be generated.) After review of the weighted averages, it 
was decided that the most extreme of the weighted averages that could be justified was the' 
2:1 weighted average. (In addition, the CAT conversion from the 3:1 weighting provided 
much the same results as the 2:1 weighted average when rounded scores were used.) The 
2:1 weighted average is presented in Table 8 along with the straight unweighted average. 
Missing conversion pomts for the 2:1 weighted average (for formula scores 56-60 and -16 
and -17) were deteimined via linear interpolation using the adjacent five formula score points 
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at the top and at the bottom that had conversion points. Finally, the 2:1 weighted average 
was used to create the final rounded SAT-M CAT conversion to be used for score reporting 
purposes. 

Doglegs aud Final Rounded Conversions 

Table 9 presents the final SAT-V CAT raw-to-scale conversion using both unrounded 
and rounded (reported) scaled scores. This final SAT-V CAT conversion was formed by 
simply averaging the conversions derived from the linear equatings in the two separate 
orders. Also presented in Table 9 are the unrounded and rounded (reported) raw-to-scale 
conversions for the form given in paper-and-pencil mode. As can be seen in Table 9, for 
higher formula scores the CAT raw-tc scale conversion is lower than the paper-and-pencil 
raw-to-scale conversion, sometimes as much as 20 scaled score points on the rounded scale. 
This is a direct outcome of the fact that the CAT estimated true formula score standard 
deviations were greater in both orders than the paper-and-pencil observed formula score 
standard deviations. (This is reflected in a slope parameter that is less than one in the linear 
equation derived by setting CAT and paper-and-pencil standard deviates equal.) The 
conversion for the form giA en in paper-and-pencil mode did not scale to 800, which is an 
ATP Program requirement, so a dogleg (see Braun and Holland, 1982) had to be fit to the 
top of the conversion to allow a formula score of 85 to scale to 800. (This dogleg is 
presented in parentheses in Table 9.) Because the SAT-V CAT conversion is lower than the 
paper-and-pencil conversion at the top, a dogleg encompassing more scaled score points had 
to be fit to the top of the CAT raw-to-scaie conversion (also presented in parentheses). In 
both cases, the doglegs formed were established to allow a smooth progression of scores with 
the maximum formula score (85) reaching 800. Finally, the CAT raw-to-scale conversion 
presented at the far right of Table 9, under the column labeled "Reported", is the conversion 
embedded in the CAT system for on-screen score reporting puiposes. 



Insert. Table 9 about here 



Table 10 presents the final SAT-M CAT raw-to- scale conversion using both 
unrounded and rounded (reported) scaled scores. This final SAT-M conversion was formed 
by creating a weighted average of the conversions derived from linear equatings in the two 
separate order*:, counting the CAT then paper-and-pencil conversion twice as much as the 
paper-and-pencil then CAT conversion. Also presented in Table 10 are the unrounded and 
rounded (reported) raw-to-scale conversions for the form given in paper-and-pencil mode. 
As can be seen in Table 10, for higher formula scores the CAT raw-to- scale conversion is 
higher than the paper-and-pencil law-to-scale conversion, sometimes as much as 20 scaled 
score points on the rounded scale. This is a direct outcome of the fact that the CAT 
estimated true formula score standard deviations were smaller in both orders than the paper- 
and-pencil observed formula score standard deviations. (This is reflected in a slope 
parameter that is greater than one in the linear equation derived by setting CAT and paper- 
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and-pencU standard deviates equal.) The conversion for the form given in paper-and-pencil 
mode did not quite scale to 800, so a dogleg had to be fit, but only at the very top of the 
conversion (at a formula score of 60). Because the SAT-M CAT conversion is higher at the 
top than the paper-and-pencil conversion, a dogleg was not necessary. As with SAT-V, the 
CAT raw-to-scale conversion presented at the far right of Table 10, under the column labeled 
"Reported", is the conversion embedded in the CAT system for on-screen score reporting 
purposes. 



Insert Table 10 about here 



DISCUSSION 

Because scores were being created on the same test form via administrations done in 
two different ways for the comparability study described in this paper, in an adaptive fashion 
via computer and in a conventional fashion via paper-and-pencil, it was anticipated that the 
relationship between the CAT and paper-and-pencil scores would likely be linear and that the 
resulting Verbal and Math CAT raw-to-so'^e conversions would be quite similar to the 
Verbal and Math paper-and-pencil raw-to-scale conversions. While all equating relationships 
in the study appeared to be quite linear, the final CAT conversion tables were not as similar 
to the paper-and-pencil conversion tables as expected. In addition, a somewhat different 
outcome occurred with the final SAT-V CAT conversion than occurred with the final SAT-M 
CAT conversion. In the case of SAT-V, the final CAT conversion was lower than the paper- 
and-pencU conversion at the top of the scale while for the SAT-M CAT, just the opposite 
occurred-the CAT conversion was higher than the paper-and-pencil conversion at the top of 
the scale. As mentioned earHer, this was the direct result of the differences observed in the 
CAT estimated true formula score and paper-and-pencil observed formula soore standard 
deviations. In the case of SAT-V, the CAT standard deviations were gieal&r than the paper- 
and-pencU standard deviations in both orders, while for SAT-M the CAT {.tandard deviations 
were less than the paper-and-pencU standard deviation in both orders. It would seem, at this 
pomt, that these results are somehow related to unexplained differences in the SAT-V and 
SAT-M CAT test taking experiences. One possible explanation now being explored has to 
do with differences in percentages of examinees completing the SAT-V CAT and paper-and- 
pencil tests versus differences in percentages of examinees completing the SAT-M CAT and 
paper-and-pencil tests and the relationship of these differences to ability level. 

One clear outcome of this study is that the random groups counterbalanced equating 
design should probably be avoided in comparabiUty studies of this sort. Even though fairly 
elaborate directions for counterbalancing were created, these directions were not followed 
and the groups taking the tests m the two orders could not be considered randomly 
equivalent. However, even if the counterbalancing directions had been followed, results 
from the Mazzeo and Harvey (1988) review indicate that the effects of having taken a 
particular sort of test first in a random groups counterbalanced design, like a CAT, are likely 
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not to be the same as the effects of having taken another sort of test, like a paper-and-pencil 
test, first in this design. It would appear that, in the process of equating a CAT, or any 
computer-based test, to a paper-and-pencU test, it is a bad idea to set up a design where 
examinees take the tests to be er,uated sequentially. It should be noted that this observation 
holds equally well for the common item, non-equivalent groups equating design, if the 
anchor test is an externally administered block of items given in either paper-and-pencil or 
computer format. That is, suppose one is attempting to equate scores on a CAT to scores on 
a paper-and-pencil test, using the common item, non-equivalent groups design, and the 
common items are a set of external items (external to the CAT and the paper-and-pencil 
form) given in paper-and-pencil format after the CAT or the paper-and-pencil test. In this 
case, it is highly likely that the experience of taking the CAT first will influence performance 
on the external set of common items in a way that is different from the experience of having 
taken the paper-and-pencil form first. In short, a design is needed where groups of 
examinees take the tests to be equated in either one mode or the other. The random groups 
design is such a design, but the standard errors of equating associated with this design 
necessitate much larger sample sizes than do the two designs just discussed. 

In sum, although the CAT raw-to-scale conversions in this study differed more from 
the paper-and-pencil raw-to-scale conversions than had been anticipated, the conversions 
weie viewed as acceptable, given the purposes for which the SAT CAT prototype was 
developed. This is not to say that the equatings, or more precisely, the size and nature of 
the samples used in the equatings, would have been viewed as completely adequate if the 
scores from the SAT CAT were to be used for actual admissions puiposes. It is clear that if 
a computer adaptive version of the SAT is ever constructed from a pool of secure SAT 
items, and the resulting scores are to be used for actual admissions puiposes, a greater level 
of attention will need to be paid to equating and data collection activities. Based on the 
results of this study, should this activity take place in the future, it is recommended that a 
random groups with one test administered to each group equating design be used to establish 
the comparability of scores on the CAT and the paper-and-pencil test. 
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Table 1 



Numbers of Items by Item Type aiid Content Area 
in the SAT-V and SAT-M Paper-and-Pencil Test, 
CAT, and CAT Item Pools 



SAT-V 





Reading 
Comprehension 
Items 


Antonym 
Items 


Analogy 
Items 


Sentence 
Completion 
Items 


Total Number 
of Items 


Ps^r-and-Pencil Test 


25' 


25 


20 


15 


85 


CAT 


82 


8 


6 


5 


27 


Verbal CAT Pool 


91^ 


74 


51 


87 


303 



SAT-M 





Regular 
5-Choice 
Items 


4-Choice 
Quantitative 
Comparison 
Items 


Arithmetic 
Items 


Algebra 
Items 


Geometry 
Items 


Miscellaneous 
Items 


Total 
Number 
of 
Items 


Paper-and-Pencil Test 


40 


20 


18-19 


17 


16-17 


7-9 


60 


CAT 


13 


7 


5-6 


6 


6 


2-3 


20 


Math CAT Pool 


128 


107 


70 


65 


66 


34 


235 



'Based on 5 or 6 passages witli 3 to 5 items per passage 

^Based on 3 passages, with 2 passages having 3 items each and 1 passage having 2 items 

^Based on 27 passages, having IBrom 3 to 6 items per passage, of which either 2 or 3 items are chosen for a CAT 



Table 2 



Number of Examinees by College/Testing Center 
Taldng Ae Paper-and-Pencil SAT and SAT CAT 
on Same and Different Days 



College/Center 


P^er-and-Pencii and 
CAT on Same Day 


Paper-and-Pencil and CAT 
on Different Days 


Total 


CAT First 


P-P First 


CAT First 


P-P First 


A 


21 


12 


23 


29 


85 


B 


102 


83 


0 


7 


192 


C 


71 


64 


0 


3 


138 


D 


61 


55 






116 



Examinees 


Overall Totals 


Both Orders 


521 


CAT Taken First 


278 


P^er-and-Pencil Taken First 


243 


Table 3 




Number of Examinees After Matching and Removal 
Because of Incomplete Records/Inappropriate Testing 


Examinees 


Overall Totals 


Both Orders 


506 


CAT Taken First 


271 


P^r-and-Pencil Taken First 


235 



Table 4 



SAT-V CAT and Paper-and-Pencil Summary Data 
by Testing Order for Each of tiie Four Colleges/Testing 
Centers and for the Total Groups 



CAT Taken First 





College/Center A 


College/Center B 


College/Center 


College/Center D 


Totals 


CAT 


SD 


27.61 (27,83)2 


29.68 (29.29) 


31.41 


46.97 (46.71) 


33.70 (33.48) 


13.72 (13.82) 


17.00 (16.94) 


18.43 


16.65 (16.59) 


18.27 (18.22) 














P-P 


X 
SD 


28.88 (28.63) 


28.12 (27.76) 


30.45 


45.74 (45.59) 


32.79 (32.54) 


12.92 (12.98) 


16.01 (15.53) 


17,64 


16.38 (16.60) 


17.50 (17.41) 














r 

N 


.819 (.842) 


.888 (.903) 


.928 


.874 (.903) 


.907 (.919) 


41 (40) 


100 (98) 


69 


61 (59) 


271 (266) 



Paper-and-Pencil Taken First 





College/Center A 


College/Center B 


College/Center C' 


College/Center D 


Totals 


P-P 


X 
SD 


23.08 (23.30) 


27.74 (27.62) 


31.36 


40.13 (40.06) 


30.87 (30,83) 


13.05 (13.16) 


14.38 (14.55) 


18.00 


18.09 (18.08) 


17.06 (17.08) 














CAT 


X 
SD 


21.24 (21.86) 


30.05 (29.96) 


31.31 


39.07 (38.8'.) 


31.08 (31,08) 


15.04 (14.74) 


17.18 (16.oy) 


18.46 


16.30 (16.54) 


17.81 (17.65) 














r 
N 


.871 (.878) 


.874 (.896) 


.930 


.886 (.923) 


.897 (.913) 


38 (37) 


78 (76) 


64 


55 (53) 


235 (230) 



'Means are estimated true formula score or observed formula score means on the 85-item SAT-V. 
^Data in parentheses were • derived after removal of outlying pairs of scores, 
^ere were no outlying pairs of scores for this college/center. 

„o ,,, BEST COPY AVAILABLE 



Table 5 



SAT-M CAT and Paper-and-Pencil Summary Data 
by Testing Order for Each of the Four Colleges/Testing 
Centers and for the Total Groups 



CAT Taken First 





College/Center A^ 


College/Center B 


College/Center C 


College/Center D^ 


Totals 


CAT 


X» 
SD 


14.42 


17.52 (17.62)2 


21.49 (20.81) 


34.05 


21.78 (21.68) 


10.31 


12.98 (13.09) 


15.69 (15.38) 


12.83 


15.00 (14.96) 














P-P 


X 
SD 


14.63 


16.81 (16.90) 


20.59 (19.88) 


34.57 


21.44(21.34) 


11.28 


13.55 (13.56) 


16.67 (16.30) 


12.57 


15.67 (15.58) 














r 

N 


.861 


.894 (.907) 


.923 (.929) 


.927 


.927 (.933) 


41 


100 (98) 


69 (67) 


61 


271 (267) 


Paper-and-Pencil Taken First 




College/Center A' 


College/Center B 


College/Center O 


College/Center D 


Totals 


P-P 


X 
SD 


11.11 


15.39 (15.25) 


21.55 


29.86 (29.44) 


19.76 (19.53) 


9.08 


13.34 (13.49) 


14.55 


13.26 (13.38) 


14.58 (14.57) 














CAT 


X 
SD 


9.63 


14.10 (14.47) 


19.66 


27.60 (26.48) 


18.05 (17.83) 


9.04 


13.00 (12.95) 


13.42 


13.19 (12.63) 


14.00 (13.64) 














r 
N 


.784 


.888 (.916) 


.928 


.882 (.902) 


.913 (.922) 


38 


78 (76) 


64 


55 (52) 


235 (230) 



'Means are estimated true formula score or observed formula score means on the 60-item SAT-M. 

^DaU in parentheses were derived after removal of outlying pairs of scores. 

^ere were no outlying pairs of scores for the particular testing order at this college/center. 



Table? 



FoxaoXa 
Scor* 



SAT-V PAPER-AHD-raCIL AHD CAT 
UiatO0in}H> BAU-IO'SCAL£ COmfBtSXONS 

P«p«r-«ad-P«ttCil CAT CcBiy%CTioDn ^ 

Cosvmica CAT Tak«a First X«k«fi First ATums« 



as 

8A 

sa 

82 
8X 
80 
79 
78 
77 
76 
75 
74 
73 
72 
71 
70 
69 
68 
67 
66 
65 

6A : 

63 

62 

61 

60 

59 

58 

57 

56 

55 

54 

53 

52 

51 

50 

49 
' 48 
47 
46 
45 
44 
43 
42 
41 
40 
39 
36 
37 
36 
85 
34. 
33 
32 
31 
30 
29 
28 
27 
26 
25 
24 
23 
22 
21 



765.7534 

760.3660 

749.7511 

742.5784 

735.7382 

728.1016 

720.4621 

713.5629 

706.8445 

697.2582 

687.9710 

680.6924 

672.8395 

665.6194 

658.1815 

649.7708 

642.32^6 

635.2870 

629.2920 

622.1838 

613.6360 

^6.7781 

600.0319 

594.0447 

587.6907 

580.4023 

573.D099 

567.6985 

562.1671 

556.1771 

549.1594 

542.6275 

536.4192 

530.9346 

525.1453 

518.2675 

511.6707 

505.4827 

500.0655 

494.5087 

487.8736 

481*3245 

«>75.3060 

469.9502 

464.4627 

457.6849 

450.8045 

444.4807 

438*7470 

432*7595 

425*7150 

418*824^3. 

412.1846 

406*0062 

399.4599 

391.6911 

384.3955 

377.4457 

371*0754 

364.2633 

356.3665 

349.0457 

341.7375 

334.9682 

327*7970 



740.9539 

734.2651 

726.9687 

719.7467 

713.1658 

706.7055 

697.5467 

888.6647 

681.5607 

674.1265 

667.1247 

660.0639 

652.2745 

644.8733 

638.0097 

631.8787 

625.5679 

618.0866 

610.6547 

604.1462 

597.9631 

5S2.1326 

585.8224 

579.0276 

572.8717 

567.0207 

561.7002 

555.9431 

549.2384 

542.9923 

537.0428 

331.7301 

529.2432 

519.8786 

513.5102 

507.4842 

502.0592 

496.8017 

490.9074 

4^4.6111 

478.5947 

473.1156 

467.9507 

462.2953 

455.7916 

449.3464 

443.4144 

437.9005 

432.6777 

425.3554 

418.7739 

412.4304 

406*5104 

400.2831 

393*0181 

385.9670 

379*2527 

373.0158 

366.6421 

359.4762 

352*2551 

345*2673 

338.5396 

331.9003 

324*9468 



749.7674 

742.8217 

736.1918 

728.8554 

721.4636 

714.6908 

708.16a4 

699. U63 

690.3917 

682.8253 

675.3950 

666.2028 

661.0837 

653.3250 

645. 7U4 

638*7169 

632.4079 

626.1085 

618.6324 

611.0088 

604*4122 

598.1261 

592*2280 

585*8428 

578*9665 

572*7375 

566*8336 

561*4245 

555.5343 

548*7727 

5^2.4610 

536*4538 

531*1428 

525*5526 

518*9741 

512*5621 

506*5193 

501.1484 

495.7995 

489*6298 

483.2700 

477.2888 

471*8682 

466.6260 

460.5764 

453*9627 

447.5882 

441*7502 

436*0896 

420*8611 

423*1031 

416*5226 

410*2429 

404*1609 

397*5216 

390*1072 

383*1117 

376.4753 

370*2583 

363*5719 

355.9626 

348.8792 

341*8077 

335.2525 

328*3304 



745.3607 
738.5434 
731.5802 . 
724.3010 
717.3147 
710.6981 
702.8538 
694.0555 
685.9762 
678.4759 
671*2599 
664.1334 
656.6791 
649.0992 
641.8611 
635.2978 
628.9879 
622.0976 
614.6436 
607.5775 
601.1877 
595.1293 
589.0252 
582.4352 
575.9191 
569.8791 
584.2669 
558.6638 
552.3864 
545.8825 
539.7519 
534.0920 
528.6930 
522.7156 
516;2421 
510.0231 
504.2893 
498.9750 
493.3535 
487.1204 
480*9324 
475.2022 
469*9194 
464.4607 
458.1840 
451*6545 
445*5013 
439*8254 
434.0837 
427*6083 
420.9385 
414*4765 
408*3767 
402*2235 
395*2698 
•3rB.0371 
381*1822 
374.7455 
368*4502 
361.5241 
354.1089 
347.0733 
340*1737 
333.5764 
326;6386 
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FRir 



BEST COPY AVAILABLE 



Tabk 7 (cooL) 



SAI-^T PA]PE&-A3ID«>?ISCIt AHD CAT 
X2K&0UKDS) &Atf-TO-SCAL£ COSVOtSZOHS 



FowlUi P«p«r-«id-F«cil CAT Cgavriio ns 





CuQwtjpslon 




AAKVn «AA*b 


AlTlAlf* 


*U 








319.5295 








• XAU9 


312 4ft20 










305 1437 


1/ 






90Q ^RAO 


90H Ofi<3 


xo 


9<Cin fTTOR 
AiVU vQ/ V D 




A9T7 


9Q0 Q5E8 






9SI9 AtlA 


9MS y?i7 


2113 A213 

AOw. 9XX^ 




270.X4O4 


97< ft^^n 




27S SfifiK 


19 


Zo9*l/Z6 


9Cft O^ftlC 


9^1 Afiai 


27ft 23811 


12 


Zol.eisra 


9ai C<9K 


9KX ^noK 

*D4 . 


7<r» 2311 

. Mil XX 


U 


S34.2737 


9<A *VAtt^ 

254.7492 


9<7 97QQ 

257.A/W 




10 


OH OA 

A^o.*139 


247.7097 


250. UO/ / 


24^ ltQft7 


9 


239.6942 


240.7923 


243 .0400 


9A1 ttlKI 
a41*«101 


8 


232.44ol 


233.8708 


290. 0O70 




7 


224.7575 


226.6117 


99tt AH C*V 

SSo. 6l57 


997 7197 


6 


217 .1527 


219*3250 


2Z1.4190 


99rt ^Aax 


5 


A'yA*9 


211*8774 


214. w 041 


919 Q0n7 
ieAX.j¥U/ 




202.0323 


9aa sn^ifi 
204.6039 


9AC CIC91 

2llO.P5fc3. 


9nc 79110 


3 


^ OA AH AC 

IS^.vIOd 


19/ 


IQQ 9ft'VJ( 


XV0.9OUU 


2 


186.2092 


169. 9303 


1 01 


ion 7101 
IvU V / xuo 


1 


179 . 6673 


162.9955 


164.3915 


IM . 0«a5 


0 


172.2797 


170.9b79 


iT9 A^c^a 
1/ / . «59v 


177 llOA 
X/ / .xxuo 


•1 


H CA YflOQ 

164. 7299 


1I>V**449 


170.0/10 


1 RQ QT77 




ldD.9613 




109 . JfcD* 


1K9 moK 


a4 
•3 


H CH 


155.1359 


130. UI»3V 


1 ^4 SI 57 




. / / / A 


HAa 1U<s 
149 . 19^« 




149.7141 






14*. 194 e 


1 An 1 (r>^ 


142 64A2 

X^X . WW 


- ^ 


H9C1 ftsn? 


U5.1Aib¥ 


I^C fl«9Q 
XJO ♦ V JA* 


135 S£29 




H<>H •t^tHQ 


19a fl099 


0495 


19» S174 






191 071^ 


1:21 fl321 
X^ A. V<#&X 


191 4518 


•»g 


107 Q!tSl 




114*7217 


114«3863 


-10 


99.6867 


107.0302 


107.6114 


107.3208 


-11 


92.3383 


100.0095 


100.5010 


100.2552 


-12 


•4.9900 


92.9888 


83.3906 


93.1897 


-13 


77.6416 


85.9683 


86.2802 


86.1242 


-14 


70^2932 


78.9476 


79.1699 


79.0567 


-15 


62.9446 


71.9269 


72.0595 


71*8932 


-16 


55.5964 


64.9062 


64.9491 


64.9277 


-17 


48.2480 


57.8856 


57*8387 


57.8621 


-18 


40.8996 


50.8649 


50.7283 


50*7966 


-19 


33.5512 


43.8442 


43.6179 


43.7310 


-20 


26.2028 


36.8235 


36.5075 


36.6655 


-21 


18.8544 


29.8029 


29.3971 


29.6000 



or 



BEST COPY AVAILABLE 



Tablet 



SAT-M PAPER-AND-PENCIL AND CAT 
UNROUNDED RAW-TO-SCALE CONVERSIONS 



Focnula 
Score 


Paper - and-Pencil 
CoTwersion 




CAT Conversions 


CAT Taken First 


P-P Taken First 

AAA**-A-AAAAAAA'AA' 


1:1 Average 

AAAAAAAAA-** 


60 


790.9530 


AAAAAA^AAAAAA** 






59 


781.3002 








58 


773.3467 


782.8907 






57 


762.8606 . 


774.3267 






56 


753.1326 


763.7168 




.768.5996 


55 


743.6692 


753.5226 


783.6765 


54 


734.1952 


743.6553 


774.7589 


759.2071 


S3 


724.5774 


733.7814 


764.0029 


748.8922 


52 


714.7512 


723.7463 


753.5248 


738.6356 


51 


704.7181 


713.4856 


743.4011 


728.4434 


50 


694.4930 


703.0034 


733.2631 


718.1332 


49 


684.1221 


692.3228 


722.9508 


707.6368 


48 


673.6617 


681.4985 


712.4019 


696.9502 


47 


663.1666 


670.3932 


701.6222 


686.1077 


46 


652.6920 


659.6688 


690.6413 


675.1551 


45 


642.2720 


648.7794 


679.5194 


664.1494 


44 


631.9491 


637.9668 


668.3236 


653.1452 


43 


621.7254 


627.2604 


657.1202 


642.1903 


42 


611.6123 


616.6672 


645.9621 


631.3146 


41 


601.6140 


606.1960 


634.8964 


620.5462 


40 


591.7290 


595.8482 


623.9429 


609.8956 


39 


581.9527 


585.6203 


613.1119 


599.3661 


38 


572.2845 


575.5098 


602.4105 


588.9601 


37 


562.7122 


565.5077 


591.8381 


578.6729 


36 


553.2318 


555.6065 


581.3960 


568.5012 


35 


543.8397 


545.8019 


571.0765 


558.4392 


34 


534.5276 


536.0861 


560.8653 


548.4757 


33 


525.2904 


526.4525 


550.7576 


538.6050 


32 


516.1234 


516.8957 


540.7476 


528.8216 


31 


507.0256 


507.4140 


530.8265 


519.1202 


30 


497.9884 


497.9986 


520.9884 


509.4935 


29 


489.0166 


488.6565 


5U.2296 


499.9431 


28 


480.1094 


479.3837 


501.5443 


490.4640 


27 


471.2582 


470.1730 


491.9311 


481.0521 


26 


462.4745 


461.0353 


482.3917 


471.7135 . 


25 


453.7578 


451.9713 


472.9188 


462.4451 


24 


445.1133 


'♦42.9859 


463.5198 


453.2528 


23 


436.5426 


434.0815 


454.1970 


444.1392 


22 


428.0495 


425.2627 


444.9570 


435.1099 


21 


419.6389 


416.5349 


435.8049 


426.1699 


20 


411.3148 


407.9013 


426.7419 


417.3216 


19 


403.0786 


399.3633 


417.7735 


408.5684 


18 


394.9312 


390.9213 


408.9040 


399.9126 


tJ 


386.8722 


382.5756 


400.1347 


391.3552 


16 


378.9028 


374.3237 


391.4662 


382.8950 


15 


371.0172 


366.1580 


382.898D 


374.5285 


14 


363.2081 


358.0698 


374.4302 


366.2500 



2:1 Average 

AAAAAAAAAAA 

814.0579 
803.9611 
793.8643 
783.7675 
773.6707 
763.5739 
754.0232 
743.8553 
733.6725 
723 -.4575 
713.0899 
702.5322 
691.7996 
680.9362 
669.9930 
659.0261 
648.0858 
637.2137 
626.4321 
615.7628 
605.2131 
594.7842 
584.4767 
574.2845 
564.2030 
554.2268 
544.3458 
534.5542 
524.8463 
515.2182 
505.6619 
496.1809 
. 486.7706 
477.4257 
468.1541 
. 458.9538 
449.8305 
440.7866 
431.8275 
422.9582 
414.1815 
405.5000 
396.9155 
388.4287 
380.0379 
371.7383 
363.5233 
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Table 8 (coat) 

SAT-M PAPER-AND-PENCIL AND CAT 
UNROUNDED RAW-IO-SCALE CONVERSIONS 



Formula Paper- and-Pencil 
Score Conversion 



CAT Conversions 



GAT Taken First 

AAAAA " AAAAAA T» A i fc * 



P-P Taken First 

AAAAAAAAAAAAAAA 



1:1 Average 

AAAAAAAAAAA 



2:1 Average 

AAAAAAAAAAA 



13 


J^!) •^0/3 




12 




i^^^ • UOV/O 


11 




• X«3Uo 


10 




"^^fi ^A1 Q 


9 


^zh* 7:)X7 




8 


31/ • ^oZh 


"^1 n A1 RA 


7 


• / DoU 


^uz 000 


0 


3uZ*XZlU 


4t7*f ••fD**U 


c 
D 






/. 
*^ 


Zoo • OoOZ 


Z / 0 • xoox 




Z /O • oDOl 


QfiR9 
^07 • 700^ 


z 


Z /U • 7D0 / 


zox • o^zx 


1 


ZbZ. 7H7^ ' 


Z2^^ • XOO7 


0 


OCA fiAOQ 


OA A qSQc; 


-1 


0 A £ £ 0 A C 
ZHD • 03UD 


z^3 • yyjjyj 


-z 


0^0 ^17A 
ZJo • Ji./U 


007 lAAT 


. -3 


00 Q Q1 OA 
ZZ7 • ^XOH- 


^^XO . ^W^U 




001 A*7QA 
ZZi..H/7^^ 


OAQ A'^'^Q 
^U7 • w^^7 


-5 


213.0339 


201.7735. 


-6 


204. 7787 


193.3801 


-7 


198.1509 


183.7171 


-8 


188.5125 


174.4080 


• -9 


179.5749 


165.0988 


-10 


170.6372 


155.7897 


-11 


161.6994 


146.4806 


-12 


152-7617 


137.1715 


-13 


143.8241 


127.8624 


-14 


134.8864 


118.5532 


-15 


125.9487 


109.2442 


-16 


117.0110 




-17 


108.0734 







358.0503 


355.3830 


5S7 7554 


349.9181 


347.3057 




341.8397 


339.2767 


J^X • / ^ 


333.7996 


331.2804 




325.7807 


323.2994 


35*1 1151 

•9X«/ . XX^X 


317.7652 


315.3163 


317 0011 
^x / • wxx 


309.7339 


307.3116 


308 8711 

• 0 / XX 


301.6675 


299.2664 


300 7011 


293.5466 


291.1618 


5Q5 4725 


285.3553 


282.9829 


28^. 1695 


277.0789 


274.7153 


275 7766 

• / /WW 


268.7043 


266.3469 


267 2796 


260.2232 


257.8711 


258 6704 

A»W W « W / W"T 


251.6294 


249 . 2824 


249 9444 


242.9247 


240.5848 


241 1013 


234.1227 


231.7965 


232.1549 


225.2535 


222.9530 


00*1 1A7/; 

ZZw • x^/ 0 


216 3907 

XXU * w7w / 


214.1385 


214.1239 


207.9487 


205.8903 


205.2777 


199.3289 


197.3459 


198.0721 • 


190.894S 


188.5021 


187.8261 


181.1171 


178.8807 


178.2752 


171.6870 


169.4910 


168.7242 


162.2569 


160.1012 


159.1731 


152.8269 


150.7114 


149.6222 


143.3968 


141.3217 


140.0713 


133.9668 


131.9320 


130.5203 


124.5367 


122.5422 


120.9693 


115.1067 


113.1525 


U1.4183 




103.7628 
94.3731 



ierIc 
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Table 9 



OCMRABZSOVS OT SAT-VmAL 
PAFKK-IID-FIICXL AID CMS COraBSXGBS 



SOGBE 



PAPEE-JJa)-F8KGIZ. 
SCALSD SCaCKE 



CAT 

gp.AT.gn SOQSE 
OreOOTDED 



SEPGSZS) SOOBES? 



as 


765.7534 <795.1) 


800 


745.3607 (795, 


1) 


800 


0 




760.3660 (775.1) 


780 


738.5434 (775.1) 


780 


0 


S3 


749.7511 (755.1) 


760 


731.5802 (755.1) 


760 


0 


S2 


742.5784 (745.1) 


750 


724.3010 (735. 


1) 


740 


10 


SI 


735.7382 


740 


717.3147 




720 


20 


SO 


728.1016 


730 


710.6981 




71 


20 


79 


720.4621 


720 


702*8536 




700 


20 


7S 


713.5629 


710 


694.0555 




690 


20 


77 


706.8445 


710 


685.9762 




, 690 


20 


76 


697.2582 


700 


678.4759 




680 


20 


75 


687.9710 


690 


671.2599 




670 


20 


74 


680.6924 


680 


664.1334 




660 


20 


73 


672.8395 


670 


656.6791 




660 


10 


72 


665.6194 


670 


649.0992 




650 


20 


71 


658.1815 


660 


641.8611 




640 


20 


70 


649.7708 


650 


635.2978 




640 


10 


69 


f ' ^,.3246 


640 


628.9879 




630 


10 


68 


635.2370 


640 


622.0976 




620 


20 


67 


629.2920 


630 


614.6436 




610 


20 


66 


622.1838 


620 


607.5775 




610 


10 


65 


613.6360 


610 


601.1877 




6C0 


10 


64 


606.7781 


610 


595.1293 




600 


10 


63 


^0.0319 


600 


589.0252 




590 


10 


62 


594.0447 


590 


582.4352 




580 


10 


61 


5^7.6907 


590 


575.9191 




580 


10 


60 


580.4023 


580 


569.8791 




570 


10 


59 


573.9099 


570 


564.2669 




560 


10 


56 


567.6985 


570 


558.6838 




560 


10 


57 


562.1671 


560 


552.3864 




550 


10 


56 


556.1771 


560 


545.8825 




550 


10 


55 


549.1594 


550 


539.7519 




540 


10 


54 


542.6275 


540 


534.0920 




530 


10 


53 


536.4192 


540 


528.6930 




530 


10 


52 


530.9346 


530 


522.7156 




520 


10 


51 


525.1453 


530 


516.2421 




520 


10 


50 


518.2675 


520 


510.0231 




510 


10 


49 


511.6707 


510 


504.2893 




500 


10 


4S 


505.4827 


510 


498.9750 




500 


10 


47 


500.0655 


500 


493.3535 




490 


10 


46 


4^^4.5087 


490 


487.1204 




490 


0 


45 


487.8736 


490 


480.9324 




480 


10 


44 


481.3245 


480 


475.2022 




480 


0 


43 


475.3060 


480 


469.9194 




470 


10 


42 


469.9502 


470 


464.4607 




460 


10 


41 


464.4627 


460 


458.1840 




460 


0 


40 


457.6849 


460 


451.6545 




450 


10 


39 


450.8045 


450 


445.5013 




450 


0 


38 


'S4.4807 


440 


439.8254 




440 


0 


37 


438.7470 


440 


434.0837 




430 


10 


36 


432.7595 


430 


427.6083 




430 


0 


35 


425.7150 


430 


420.9385 




420 


10 


34 


418.8243 


420 


414.4765 




410 


10 


33 


412.1846 


410 


408.3767 




410 


0 


32 


406.0062 


410 


402.2235 




400 


10 


31 


399.4599 


400 


395.2698 




400 


0 


30 


391.6911 


390 


388.0371 




390 


0 


29 


384.3955 


380 


381.1822 




380 


0 


28 


377.4457 


380 


374.7455 




370 


10 


27 


371.0754 


370 


368.4502 




370 


0 


26 


364.2633 


360 


361.5241 




360 


0 


25 


356.3665 


360 


354.1089 




350 


10 


24 


349.0457 


350 


347.0733 




350 


0 


23 


341.7375 


340 


340.1737 




340 


0 


22 


334.9682 


330 


333.5764 




330 


0 


21 


327.7970 


330 


326.6386 




330 


0 




^Riper-iiid-pcncil rcpoiud - CAT reported 



Table 9 (cont.) 



COMPAHISORS OF SAT-VERBAL 
PAPER-AMD-PEHCIL AHD CAT COHVEESKttfS 

PAPKR-AHD-PKHCIL CAT 





SCALED 


SCORE 


?vl . ^ 1 . K. 1 1 




DITFEKEH 


SCORE 

' ^ A 'A A' A A A A 










REPORTED 


20 


320 3S9A 


320 




jzu 


0 


19 


313 . 1123 


310 




91 n 
jxu 


0 


18 




310 




310 


0 


17 


^ • u / 


300 




300 


0 


16 




290 




9on 
Z7U 


0 


15 




280 




280 


0 


14 


276 . 1484 


280 


Z /O. 7DDO 


280 


0 


13 


269 . 1728 


270 


Z / U . Z 


270 


0 


12 


261 .8190 


260 


263 2311 


260 


0 


11 


254 .2757 


230 


zoo . 


260 


-10 


10 


246.9199 


250 


9AR RQR7 


250 


0 


9 


239 . 6942 


240 


OAT Ol 
Z% J. . 7 J.OX 


240 


0 


8 


232.4461 


230 


234 . 9539 


230 


0 


7 


224 .7575 


220 


227 7137 


230 


-10 


6 


217.1527 


220 


220 3694 


220 


0 


5 


209 .4242 


210 


212 . 9907 


210 


0 


4 


202.0323 


200 




210 


—10 


3 


194 0106 


200 


1 0R 99nn 


200 


0 


2 


186 ?09? 


200 


190 7103 


200 


0 


1 


179 . 6673 


200 


1 Q9 

J.O^ . 07 


200 


0 


0 


172 .2797 


200 


177 . 1108 


200 


0 


-1 


164 . 7299 


200 


1 AO 0^7 7 

.L07 . 7J / / 


200 


0 


-2 


156 . 9613 


200 


J.OZ . 0^70 


200 


0 


-3 


151 . 1256 


200 


J.DD . OJ.D / 


200 


0 


-4 


143 .7771 


200 


1 AO 71 A 1 


200 


0 


-5 


136 47S7 


200 


142 6485 


zuu 


0 


-6 




200 


135 5829 


200 


0 


-7 


121 . 7319 


200 


128 5174 


200 


0 


-8 


114.3835 


200 


121 . 4518 


200 


0 


"9 


107 .0351 


200 




200 


0 


-10 


99 . 6867 


200 


107 '^?nR 


200 


n 

u 


-11 


92.3333 


200 


100 .2552 


200 


Q 


-12 


84.9900 


200 


93 . 1897 


200 


0 


-13 


77.6416 


200 


86.12A2 


200 


0 


-14 


70.2932 


200 


79.0587 


200 


0 


-15 


62.9448 


200 


71.9932 


200 


0 


-16 


55.5964 


200 


64.9277 


200 


0 


-17 


48.2480 


200 


57.8621 


200 


0 


-18 


40.8996 


200 


50.7966 


200 


0 


-19 


33.5512 


200 


43.7310 


200 


0 


-20 


26.2028 


200 


36.6655 


200 


0 


-21 


18.8544 


200 


29.6000 


200 


0 
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Table 10 



COMPASISOHS OF SAT-MATH 
PAPER-AHD~P£NCIL AHD CAT CORVEKSIOHS 



FOSMDLA SCOBS 

60 
59 
58 
57 
56 
55 
54 
53 
52 
51 
50 
49 
48 
47 
46 
45 
44 
43 
42 
41 
40 
39 
38 
37 
36 
35 
34 
33 
32 
31 
30 
29 
28 
27 
26 
25 
24 
23 
22 
21 
20 
19 
18 
17 
16 
15 
14 
13 
12 
11 
10 
9 
8 
7 
6 
5 
4 
3 
2 
1 
0 
-1 
-2 
-3 
-4 
-5 



PAFE&-AKD 
SCALED 

ONRODICDED 

790.9530 (795.1) 
781.3002 
773.3467 
762.8606 
753.1326 
743.6692 
734.1952 
724.5774 
714.7512 
704.7181 
694.4930 
684 . 1221 
673.6617 
663.1666 
652.6920 
6A2.2720 
631.9491 
621 . 7254 
611.6123 
601.6140 
591.7290 
531 . 9527 
572.2845 
562.7122 
553.2318 
543.8397 
534.5276 
525.2904 
516.1234 
507.0256 
497.9884 
489.0166 
480.1094 
471.2582 
462.4745 
453.7578 
445.1133 
436.5426 
428.0495 
419.6389 
411.3148 
403.0786 
394.9312 
386.8722 
378 . 9028 
371.0172 
363.2081 
355.4675 
347.7849 
340.1481 
332.5420 
324.9519 
317.3624 
309.7580 
302.1210 
294.4354 
286 . 6882 
278.8661 
270 . 9567 
262.9495 
254.8428 
246.6305 
238.3170 
229.9184 
221.4794 
213.0339 



PENCIL 
SCO&E 

REPORTED 

800 
780 
770 
760 
750 
740 
730 
720 
710 
700 
690 
680 
670 
660 
650 
640 
630 
620 
610 
600 
590 
580 
570 
560 
550 
540 
530 
530 
520 
510 
500 
490 
480 
470 
460 
450 
450 
440 
430 
420 
410 
400 
390 
390 
380 
370 
360 
360 
350 
340 
330 
320 
320 
310 
300 
290 
290 
280 
270 
260 
250 
250 
240 
230 
220 
210 



CAT 

SCAI^ SCOBLE 



UKROURDED 

814.0579 

803.9611 

793.8643 

783.7675 

773.6707 

763.5739 

754.0232 

743.8553 

733.6725 

723.4575 

713.0899 

702.5322 

691.7996 

680.9362 

669.9930 

659.0261 

648.0858 

637.2137 

626.4321 

615.7628 

605.2131 

594.7842 

584.4767 

574.2845 

564.2030 

554.2268 

544.3458 

534.5542 

524.8463 

515.2182 

505.6619 

496.1809 

486.7706 

477.4257 

468.1541 

458.9538 

449.8305 

440.7866 

431.8275 

422.9582 

414.1815 

405.5000 

396.9155 

388.4287 

380.0379 

371.7383 

363.5233 

355.3830 

347.3057 

339.2767 

331.2804 

323.2994 

315.3163 

307.3116 

299.2664 

291.1618 

282.9829 

274.7153 

266.3469 

257.8711 

249.2824 

240.5848 

231.7965 

222.9530 

214.1385 

205.8903 



REPORTED 

800 

800 

790 

780 

770 

760 

750 

740 

730 

720 

710 

700 

690 

680 

670 

660 

650 

640 

630 

620 

610 

590 

580 

570 

560 

550 

540 

530 

520 

520 

510 

500 

490 

480 

470 

460 

450 

440 

430 

420 

410 

410 

400 

390 

380 

370 

360 

360 

350 

340 

330 

320 

320 

310 

300 

290 

280 

270 

270 

260 

250 

240 

230 

220 

210 

210 



DLFFEREHCE IN 
REPORTED SCORES* 

0 

-20 
-20 
-20 
-20 
-20 
-20 
-20 
-20 
-20 
-20 
-20 
-20 
-20 
-20 
-20 
-20 
-20 
-20 
-20 
-20 
-10 
-10 
-10 
-10 
-10 
-10 

0 

0 

-10 
-10 
-10 
-10 
-10 
-10 
-10 

0 

0 

0 

0 

0 

-10 
-10 

0 

0 

0 

0 

0 

0 

(1 

0 

0 

0 

0 

0 

0 
10 
10 

0 

0 

0 
10 
10 
10 
10 

0 



*Paper-and-pencil reported - CAT reported 
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Table 10 (cont,) 



COKPARISOHS OF SAT-MATH 
PAPER-AHD-PENCIL AHD CAT COHVEKSIOMS 



PAPER-AHD-PERCIL GAT 

SCALED SCORE SCALED SCORE DIFFEEEKCE IN 

F(«>1DLA SCOKE DNRODHDED REPORTED TORODNDED REPORTED REPORTED SCORES' 

-6 20*. 7787 200 197.3459 200 0 

-7 198.1509 200 188.5021 200 0 

-8 188 . 5125 200 178 .8807 200 0 

-9 179.5749 200 169.4910 200 0 

-10 170 . 6372 200 160 . 1012 200 0 

-11 161.6994 200 150.7114 200 0 

-12 152.7617 200 141.3217 200 0 

-13 143.8241 200 131.9320 200 0 

-14 134.8864 200 122.5422 200 0 

-15 125 . 9487 200 113. 1525 200 0 

-X6 117 . 0110 200 103 .7628 200 0 

-17 108.0734 200 94.3731 200 0 
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'Paper-and-pencil reported - CAT reported 



lATP Computer Adaptive SAT Pilot 



THE RESEARCH DESIGN 

The study is designed to establish comparable reported score scales for the 
paper-and-pencil and the computerized adaptive versions of the lATP SAT. It 
is very important that the study be conducted according to one of the designs 
outlined below and that students complete both tests within two (2) weeks. Ve 
ask that you choose one of the two designs and then test all students using 
that design. 

Design I: Coxinterbalanced Design 

This design requires that half of the students testing on a given day take the 
paper- and-pencil version first, while the second half testing on that day take 
the computerized version first. The students testing on a specific day should 
be divided into two groups of equal size in a randofa fashion. (We will supply 
you with specific procedures for splitting the total group testing on 
specific day into subgroups in a random fashion at a later date. This is 
critical to the success of the study.) Both tests would be administered on 
the same day with possibly a lunch break in between. 

Design II: Modified Counterbalanced Design 

This design has the advantage of allowing you to test all students at the same 
time with the paper -and-pencil test. Test center personnel, with the 
knowledge of who will be tested beforehand, should randomly split the total ^ 
group to be tested into two equal sized groups. Group A and Group B, (We will 
supply you with specific procedures for splitting your total group into 
subgroups in a random fashion at a later date. This is critical to the 
success of the study.) 

Group A students will be tested with the computer version of the lATP SAT for 
as many days as needed to complete the computer based testing. However, 
computer testing may not occur more than two weeks prior to the paper- and- 
pencil test administration. Group A and B will then be brought together to 
take' the paper-and-pencil test. 

At the conclusion of the paper- and-pencil test, Group A completes the 
questionnaire about their experiences with the computerized test. Group B 
participants may then begin computer based testing after completing the paper- 
and-pencil test. Testing should continue for as many days as are necessary to 
test the entire group (but no longer than two weeks after the paper- and-pencil 
test) . Students in Group B complete the questionnaire immediately after 
taking the computer based test. 



Figure 1: Designs for conducting the SAT CAT pilot/comparability study 
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lATP Computer Adaptive SAT Pilot 
Research Design/Random Assignment Guidelines 

PROCEDURES FOR SPLITTING TOTAL GROUP TO BE TESTED INTO SUBGROUPS 



DESIGN I: COUNTERBALANCED DESIGN 

This design requires that half of the examinees testing on a given day take 
the paper-and-pencil version first, while the second half testing on that day take 
the computerized version first. The examinees testing on a specific day should be 
divided into two groups of equal size in a random fashion. 

Procedure 



Condition A : If you are planning on running only two 
sessions of computerized testing on a given day (one 
session before the paper-and-pencil test, one session 
after; ALL EXAMINEES TAKE THE PAPER-AND-PENCIL TEST 
TOGETHER) and: 

Al. You are allowing examinees to choose the day they 
want to test, possibly by phone (i.e., you do not 
have beforehand an intact roster of examinees to be 
tested on a given day) , then as you are contacted by 
the examinees, alternate assignment to testing 
orders. Assign the first examinee who contacts you 
to the computer then paper-and-pencil order, the 
second examinee who contacts you to the paper-and- 
pencil then computer order, the third examinee to the 
compiler then paper-and-pencil order, etc, until all 
slots are filled. If you are using five computers, 
this means that your total group for that given day 
will consist of 10 examinees, with 5 receiving the 
computer then paper-and-pencil order and 5 the paper- 
and-pencil then computer order. All examinees 
testing on the given day should complete the 
questionnaire after they have taken both tests. 

ROSTER A2. You have beforehand an intact roster of examinees to 
be tested on a given day (this may be the case if you 
are testing local high school students). Alphabetize 
the roster and assign the first examinee listed in 
the alphabetized roster to the computer then paper- 
and-pencil order, the second listed to the paper-and- 
pencil then computer order, the third listed to the 
computer then paper-and-pencil order, etc. If you 
are using five computers, this means your total group 
for that given day will consist of 10 examinees, with 

Figure 2: Procedures for splitting SAT CAT total groups of examinees into randomly equivalent subgroups. 

3G 



TWO 

COMPUTERIZED 
SESSIONS 



NO 

ROSTER 



. 2 - 



MULTIPLE 

COMPUTERIZED 

SESSIONS 



NO 

ROSTER 



5 receiving the computer then paper-and-pencil order 
and 5 receiving the paper- and-pencil then computer 
order. The 5 receiving the computer then paper-and- 
pencil order will be in positions 1, 3, 5, 7, and 9 
on your roster for that day while the 5 examinees 
receiving the paper- and-pencil then computer order 
will be in positions 2, 4, 6, 8, and 10 on your 
roster. All examinees testing on the given day 
should complete the questionnaire after they have 
taken both tests. 

Condition B : If you are planning on running multiple 
sessions of computerized testing on a given day , then you 
must schedule the paper-and-pencil testing in the middle of 
the day (ALL EXAMINEES TAKE THE PAPER-AND- PENCIL TEST 
TOGETHER) and an equal niimber of computerized testing 
sessions before and after the paper-and-pencil testing. 
If: 

Bl. You are allowing the examinees to choose the day they 
want to test, possibly by phone (i.e., you do not 
have beforehand an intact roster of examinees to be 
tested on a given day) , then as you are contacted by 
examinees, alternate assignments to testing orders. 
That is, assign the first examinee who contacts you 
to the computer then paper-and-pencil order. This 
examinee is free to choose which of the sessions of 
computerized testing before the paper-and-pencil 
testing on that day he/she wants to attend. Assign 
the second examinee who contacts you to the paper- 
.-and-pencil then computer order. This examinee is 
also free to choose which of the sessions of 
computerized testing after the paper-and-pencil 
testing on that day he/she wants to attend. The 
third examinee who contacts you would be assigned to 
the computer then paper-and-pencil order, etc. This 
examinee and later examinees are free to choose which 
of the appropriate sessions of computerized testing 
before or after paper-and-pencil testing cn that day 
they want, provided that slots are open. Examinees 
who contact you later in the process will have to be 
assigned to a session of computerized testing. 

After you have completed assigning all examinees to 
test orders and computerized testing sessions, you 
should make sure there are an equal nximber of 
examinees assigned to sessions of computerized 
testing before paper-and-pencil testing as there are 
examinees assigned to sessions of computerized 
testing after paper-and-pencil testing. If there are 
not, assign the last examinee who contacted you to 
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Figure 2 (cont.): Procedures for splitting SAT CAT total groups of examinees into randomly equivalent subgroups. 
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another day of testing. 



All examinees testing on the given day should 
complete the questionnaire after they have taken both 
tests . 

ROSTER B2. -You have beforehand an intact roster of examinees 

testing on a given day (this may be the case if you 
are > ringing examinees to your institution to be 
tested on a specific day) . Alphabetize the roster 
and assign the first examinee listed in the 
alphabetical roster to the computer then paper-and- 
pencil order. Assign the second examinee listed in 
the alphabetical roster to the paper- and-pencil then 
computer order and the third examinee to the computer 
then paper-and-pencil order, etc. As you assign 
examinees to sessions of computerized testing, it 
would be a good idea (but it isn't necessary) to fill 
the sessions closest to the paper -and-pencil testing 
first, to minimize the number of examinees who will 
have a waiting period between testing sessions. 

After you have completed assigning all examinees to 
testing orders and computerized testing sessions, you 
should make sure there are an equal number of 
examinees assigned to sessions of computerized 
testing before paper -and-pencil testing as there are 
examinees assigned to sessions of computerized 
testing after paper-and-pencil testing. (If the 
totals differ by one, it means that you had an odd 
number of examinees on your roster. This is okay for 
testing pxirposes (i.e., go ahead and" test everyone), 
.but we will be unable to use the data from the last 
examinee assigned in the comparability study. Please 
record the name of this examinee and provide it to 
us. We will provide scores for that examinee.) 

All examinees testing on the given day should 
complete the questionnaire after they have taken both 
tests . 



DESIGN II: MODIFIED COUNTERBALANCED DESIGN 



This design has the advantage of allowing you to test all examinees at the 
same time with the paper- and-pencil test. With knowledge of the total group to be 
tested beforehand, this total group should be randomly split into two equally sized 

O i^igure 2 (cont.): Procedures for splitting SAT CAT total groups o^ examinees into randomly equivalent subgroups. 
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groups, Group A and Group B. Group A examinees will take the computerized test 
before the paper -and-pencil test while Group B examinees will take the paper -and- 
pencil test before the computerized test. 



Procedure 

The paper~and-pencil testing session needs to be scheduled in the middle of your 
testing period, so you have an equal number of days before and after this session 
for computerized testing. You may run one or multiple session? of computerized 
testing on those days. 

If the total roster of examinees to be tested is not alphabetized, then alphabetize 
it. The first examinee on the alphabetized roster should-be assigned to Group A, 
the second examinee to Group B, the third examinee to Group A, etc. You should end 
up with an equal number of examinees in Groups A and B. If you do not, and are off 
by one examinee (i.e., your total group roster had an odd nxuaber of examinees), go 
ahead and test everyone, but keep a record of the name of the last examinee assigned 
and provide if to us. We will not be able to use the data from that exam in ee in the 
comparability study, but we will provide scores for that examinee. 

After you have split the total group into Groups A and B, you may assign or allow 
examinees to select the sessions when they take the computerized test. All Group A 
examinees must, however, take the computerized test before they take the paper-and- 
pencil test. All Group B examinees must take the computerized test after they take 
paper-and-pencil test. Group A examinees should complete the questionnaire 
immediately following the paper- and-pencil test. Group B examinees should complete 
the questionnaire immediately following the computerized test. 
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Figure 2 (cont.): Procedures for splitting SAT CAT total groups of examinees into randomly equivalent subgroups. 
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Figure 3 
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OUTLIER ANALYSIS: VERBAL - P/P FIRST 
2-WAY DISTRIBUTION OF CAT & PP TESTS 
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Figure 5 
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OUTLIER ANALYSIS: MATH - P/P FIRST 
2-WAY DISTRIBUTION OF CAT & PP TESTS 
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Figure 7: Grouped frequency distributions ^f SAT-V formula scores 
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Fisiure 8: Grouped freauencv distributions of SAT*M formula scores 
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Figure 9: Raw-to-raw SAT-V difference plots, constructed^ng the linear conversion as the criterion. 
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Figure 10: Raw-to-raw SAT-M difference plots, constructed using the linear conversion as the criterion. 



